Supervised learning using Hidden Markov Models has been used to train acousticmodels for automatic speech recognition for several years. Typically clean transcriptionsform the basis for this training regimen. However, results have shown that using sources ofreadily available transcriptions, which can be erroneous at times (e.g., closed captions) donot degrade the performance significantly. This work analyzes the effects of mislabeleddata on recognition accuracy. For this purpose, the training is performed using manuallycorrupted training data and the results are observed on three different databases: TIDigits,Alphadigits and SwitchBoard. For Alphadigits, with 16% of data mislabeled, theperformance of the system degrades by 12% relative to the baseline results. For a complextask like SWITCHBOARD, at 16% mislabeled training data, the performance of thesystem degrades by 8.5% relative to the baseline results. The training process is morerobust to mislabeled data because the Gaussian mixtures that are used to model theunderlying distribution tend to cluster around the majority of the correct data. The outliers(incorrect data) do not contribute significantly to the reestimation process.
展开▼